Sequence-oriented sensitive analysis for PM2.5 exposure and risk assessment using interactive process mining

The World Health Organization has estimated that air pollution will be one of the most significant challenges related to the environment in the following years, and air quality monitoring and climate change mitigation actions have been promoted due to the Paris Agreement because of their impact on mortality risk. Thus, generating a methodology that supports experts in making decisions based on exposure data, identifying exposure-related activities, and proposing mitigation scenarios is essential. In this context, the emergence of Interactive Process Mining—a discipline that has progressed in the last years in healthcare—could help to develop a methodology based on human knowledge. For this reason, we propose a new methodology for a sequence-oriented sensitive analysis to identify the best activities and parameters to offer a mitigation policy. This methodology is innovative in the following points: i) we present in this paper the first application of Interactive Process Mining pollution personal exposure mitigation; ii) our solution reduces the computation cost and time of the traditional sensitive analysis; iii) the methodology is human-oriented in the sense that the process should be done with the environmental expert; and iv) our solution has been tested with synthetic data to explore the viability before the move to physical exposure measurements, taking the city of Valencia as the use case, and overcoming the difficulty of performing exposure measurements. This dataset has been generated with a model that considers the city of Valencia’s demographic and epidemiological statistics. We have demonstrated that the assessments done using sequence-oriented sensitive analysis can identify target activities. The proposed scenarios can improve the initial KPIs—in the best scenario; we reduce the population exposure by 18% and the relative risk by 12%. Consequently, our proposal could be used with real data in future steps, becoming an innovative point for air pollution mitigation and environmental improvement.

R1. We agree with the reviewer that the role of sequence in the analysis could be explained in more detail. The key point is that sequence analysis considers each activity's contribution and allows one to compute the exposure per citizen. This allows for avoiding population metrics that could be biased by the sampling process -this can also happen in sequential analysis. Still, experts in interactive process mining can do a track. The analysis "without edges between nodes" can be extracted for ANOVA results, and it is observed that they are similar, but the strength here is that the results are more explainable. Thus, the interactive process mining workflow is more agile.
To clarify this, the following sentences are added to the introduction.
"In our view, this limits the model considerably and leads to a loss of information because it is not possible to track how pollution affects each individual, and it is not possible to compute individual KPIs. Thus, sequential data and sequence-oriented analysis are critical to overcome these limitations. " The following explanation has been added to the Sequence-Oriented Sensitive Analysis subsection, second paragraph.
"The main goal of this work is to perform a sensitivity analysis to reduce the exposure and the health outcomes by taking action on the activities. In other words, the main goal is to identify which activities contribute more to population exposure. Sequence analysis is essential for our proposed optimisation because this allows for avoiding population metrics that the sampling process could bias. Thus, it allows for performing analyses for different population sectors -for instance, analysing only the persons affected by cardiac issues -or even tracking and optimising the activities of a single individual [1]. In addition, the classical sensitive analysis should be performed by doing different simulations among the inputs to find the inputs that minimise a set of KPIs. The computational cost of this sensitive analysis could be high due to the number of input parameters to consider. Thus, we propose a new methodology to assess an oriented sensitive analysis using interactive process mining (IPM) in this paper." Last, the following contribution was added to the discussion, in concrete, in the third paragraph.
"This sequence-oriented sensitive analysis presents several advantages concerning the classical methodologies. The main strength is sequence analysis and sequence data because experts in interactive process mining can do a track. The analysis without edges between nodes can be extracted for ANOVA results, and it is observed that they are similar, but the strength here is that the results are more explainable. For instance, by studying the sequence track it is possible to conclude that acting on building infrastructure is quite effective because it is a pattern that appears in all the sequences. The same applies to studying less successful scenarios. Thus, the interactive process mining workflow is more agile, explainable, and human-oriented, avoiding the black box effect. Clearly, this new methodology shows the target parameters, and it is only necessary to iterate on these few inputs. On the other hand, the traditional approach iterates over all the parameters to find the best combination. However, this hypothesis should be tested by conducting a pilot with real data. This approach allows extrapolation of our method for forecasting purposes, using chemistry transport models to generate predictions in gridded domains [2], being possible to perform this analysis for other pollutants, as it is done in Paris [3]. Furthermore, the most critical point is that sequence-oriented sensitive analysis is a more intelligent approach based on understandable models and human criteria. However, this methodology has some potential limitations: it is possible to miss some crucial parameters in the exposure contribution, and the outcomes are susceptible to wrong human interpretations."

1.
Zhang L, Guo C, Jia X, Xu H, Pan M, et al. (2018)  R2. We agree with the reviewer that interactive process mining always involves an evaluation from a team. This has been done indeed in this work. The approach has been validated with urban planners experts from Libelium, an IoT company that works on sustainability impact assessment in Smart Cities. The interactive process mining has been performed by Eduardo Illueca Fernández (UM) and Carlos Fernández Llatas (ITACA-SABIEN), supervised by Fernando Seoane Martinez (KI). Antonio Jesús Jara Valera (Libelium) derived the suggestions for the target scenario as an expert in sustainability impact assessment, and the validation has been performed by computing the KPIs again and obtaining improvements.
To clarify this, the following paragraph (third paragraph) and Figure 2 have been added to the manuscript.

Q4. Figures are partially pixelated and not readable
R4. We would like to thank the reviewer for highlighting this, and all the figures in the manuscript have been re-edited and modified to guarantee high-quality resolution by checking with Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/, suggested by PLOS ONE. We also noted that uploading figures to supplementary materials, the files for Figure 5 and Figure 6 were duplicated ( Figure 6 was uploaded twice), and we have corrected this now.
Q5. The figures should be better described. Percentages at edges in process models were not clear.

R5.
We agree with the reviewer that a more exhaustive description could be added to the figures. The manuscript has been updated with these extended captions. Percentages at edges in process maps were removed to improve the quality of the Figure. Q6. What were the reasons that led to the selection of the PALIA algorithm?
R6. We agree with the reviewer that the reasons for selecting PALIA are not justified in the manuscript. In general terms, we chose the PALIA algorithm after a bibliographic analysis, and we found that this algorithm fits better with the work's goals. Some sentences have been added in the fourth paragraph in Sequence-Oriented Sensitive Analysis subsection to clarify this: "In this work, we use the PALIA algorithm, implemented by the Institute of Information and Communication Technologies (ITACA) of the Universidad Politecnica de Valencia, Valencia, Spain. This algorithm is the most appropriate one for our goals, because it is based on activity-based possess mining and produces explainable process maps [8]. In addition, it performs better, in terms of efficacy, than other process mining algorithms, such as heuristic miner [9] or genetic process mining [10]. "

8.
Fernandez R7. We agree with the reviewer that an additional figure will clarify the process of application and evaluation of interactive process mining. For this reason, we have added a new Figure  2 in Sequence-Oriented Sensitive Analysis subsection, as it is explained in more detail in Q1.2.

Q8. Typo in "oriented sensitive analysis" -> sequention-oriented? semiautomatic -> semi-automatic? adition -> addition? Partially inconsistent British/American English. expsure
R8. We thank the reviewer for highlighting these typos, which have been corrected to ensure consistent language in the manuscript.

Reviewer #2
This reviewer did not ask for changes in the manuscript.
Editorial comments QE1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_b ody.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_au thors_affiliations.pdf RE1. We would like to thank the editor for highlighting the importance of format requirements. We have reviewed the format of our manuscript carefully and applied changes when necessary.
QE2. Thank you for stating the following in the Acknowledgments Section of your manuscript: We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: The author EIF has received funded from Fundacion Séneca (https://fseneca.es/), grant number 21300/FPI/19. The authors have received funded from EIT Health (https://eithealth.eu/), grant number 220649. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript Please include your amended statements within your cover letter; we will change the online submission form on your behalf.
RE2. We would like to thank the editor for noting this, and we have removed the acknowledgement section and moved the funding information to the cover letter, according to your instructions. The following lines have been added to the cover letter. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.
RE3. Dear Editor, we have uploaded our data and code into a repository in Zenodo that is linked with the following digital identifier object: https://doi.org/10.5281/zenodo.8079155 [11], and it is indexed in OpenAIR. This information has been updated in the manuscript in the code availability section and the cover letter, as you suggested in your feedback. QE4. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article's retracted status in the References list and also include a citation and full reference for the retraction notice.

Illueca Fernández E, Fernandez
RE4. Dear editor, we have performed an exhaustive revision of the whole bibliography to check that there are no retracted articles. To clarify this, DOIs have been added to all the references when possible. The new articles added to the manuscript have been specified in this rebuttal letter in their corresponding answer. The following paper has been removed from the manuscript as it does not have a consistent DOI